Skip to content

Conversation

OneZero-Y
Copy link
Contributor

@OneZero-Y OneZero-Y commented Sep 24, 2025

What type of PR is this?

refactor: Implement modular candle-binding architecture

What this PR does / why we need it:

Which issue(s) this PR fixes:
part of #59

Detailed Module Structure

1. FFI Interface Layer (ffi/)

C-compatible external interface layer with 10 modules:

ffi/
├── mod.rs                    # Unified FFI exports and documentation
├── types.rs                  # C struct type definitions
├── init.rs                   # Model initialization interface
├── classify.rs               # Classification function interface
├── similarity.rs             # Similarity computation interface
├── tokenization.rs           # Text tokenization interface
├── memory.rs                 # Memory management interface
├── memory_safety.rs          # Memory safety validation
├── validation.rs             # Input validation interface
└── state_manager.rs          # Global state management

Core Features:

  • C Functions: Covering initialization, classification, and memory management
  • Intelligent Routing: Automatic model type detection and optimal path selection
  • Memory Safety: Comprehensive allocation tracking and leak prevention
  • State Management: Global classifier instances and configuration management

2. Core Business Logic Layer (core/)

Pure business logic core with 5 modules:

core/
├── mod.rs                    # Core module exports
├── unified_error.rs          # Unified error handling system
├── config_loader.rs          # Unified configuration loading system
├── similarity.rs             # Core similarity algorithms
└── tokenization.rs           # Core tokenization logic

Unified Error Handling:

  • UnifiedError: Structured error types
  • ConfigErrorType: Configuration-related error subtypes
  • ValidationError: Validation error types
  • config_errors!, validation_error!: Convenience macros

Unified Configuration Loading:

  • UnifiedConfigLoader: Centralized configuration loading
  • Specialized loaders: IntentConfigLoader, PIIConfigLoader, SecurityConfigLoader
  • TokenConfigLoader, LoRAConfigLoader, ModernBertConfigLoader

3. Model Architectures Layer (model_architectures/)

Dual-path model architecture layer with 13 files:

model_architectures/
├── mod.rs                    # Unified model exports
├── traits.rs                 # Core trait definitions
├── unified_interface.rs      # Unified interface layer
├── model_factory.rs          # Intelligent model factory
├── config.rs                 # Model configuration system 
├── routing.rs                # Dual-path intelligent routing
│
├── traditional/              # Traditional fine-tuning path
│   ├── mod.rs                # Traditional model exports
│   ├── bert.rs               # BERT model implementation
│   ├── modernbert.rs         # ModernBERT model implementation
│   └── base_model.rs         # Traditional model base classes
│
└── lora/                     # LoRA parameter-efficient path
    ├── mod.rs                # LoRA model exports
    ├── bert_lora.rs          # BERT+LoRA integration 
    └── lora_adapter.rs       # LoRA adapter core

Core Trait Architecture:

  • CoreModel: Basic model operations (model_type, forward, get_config)
  • PathSpecialization: Path-specific optimizations (supports_parallel, confidence_threshold)
  • ConfigurableModel: Configuration-driven loading (load)
  • LoRACapable: LoRA capabilities (inherits CoreModel)
  • TraditionalModel: Traditional model capabilities (inherits CoreModel)

4. Classifier Systems Layer (classifiers/)

Classifier systems layer with 11 files:

classifiers/
├── mod.rs                    # Unified classifier exports
├── unified.rs                # Dual-path unified classifier
│
├── traditional/              # Traditional classifier implementations
│   ├── mod.rs                # Traditional classifier exports
│   ├── modernbert_classifier.rs # ModernBERT specialized classifier
│   └── batch_processor.rs    # Traditional batch processor
│
└── lora/                     # LoRA classifier implementations
    ├── mod.rs                # LoRA classifier exports
    ├── intent_lora.rs        # Intent classification
    ├── pii_lora.rs           # PII detection 
    ├── security_lora.rs      # Security analysis 
    ├── token_lora.rs         # Token classification
    └── parallel_engine.rs    # Parallel execution engine

Multi-Task Parallel Processing:

  • Intent || PII || Security concurrent execution
  • Specialized token-level classifier supporting NER and other tasks
  • Traditional path providing high-accuracy stable classification

5. Utilities Layer (utils/)

Intelligent memory management utilities layer with 2 files:

utils/
├── mod.rs                    # Utility module exports
└── memory.rs                 # Dual-path memory optimization 

Copy link

netlify bot commented Sep 24, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 6cd74d5
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68d8bf4f4a772900086b10d0
😎 Deploy Preview https://deploy-preview-207--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

github-actions bot commented Sep 24, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 candle-binding

Owners: @rootfs
Files changed:

  • candle-binding/src/classifiers/lora/intent_lora.rs
  • candle-binding/src/classifiers/lora/mod.rs
  • candle-binding/src/classifiers/lora/parallel_engine.rs
  • candle-binding/src/classifiers/lora/pii_lora.rs
  • candle-binding/src/classifiers/lora/security_lora.rs
  • candle-binding/src/classifiers/lora/token_lora.rs
  • candle-binding/src/classifiers/mod.rs
  • candle-binding/src/classifiers/traditional/batch_processor.rs
  • candle-binding/src/classifiers/traditional/mod.rs
  • candle-binding/src/classifiers/traditional/modernbert_classifier.rs
  • candle-binding/src/classifiers/unified.rs
  • candle-binding/src/core/config_loader.rs
  • candle-binding/src/core/mod.rs
  • candle-binding/src/core/similarity.rs
  • candle-binding/src/core/tokenization.rs
  • candle-binding/src/core/unified_error.rs
  • candle-binding/src/ffi/classify.rs
  • candle-binding/src/ffi/init.rs
  • candle-binding/src/ffi/memory.rs
  • candle-binding/src/ffi/memory_safety.rs
  • candle-binding/src/ffi/mod.rs
  • candle-binding/src/ffi/similarity.rs
  • candle-binding/src/ffi/state_manager.rs
  • candle-binding/src/ffi/tokenization.rs
  • candle-binding/src/ffi/types.rs
  • candle-binding/src/ffi/validation.rs
  • candle-binding/src/model_architectures/config.rs
  • candle-binding/src/model_architectures/lora/bert_lora.rs
  • candle-binding/src/model_architectures/lora/lora_adapter.rs
  • candle-binding/src/model_architectures/lora/mod.rs
  • candle-binding/src/model_architectures/mod.rs
  • candle-binding/src/model_architectures/model_factory.rs
  • candle-binding/src/model_architectures/routing.rs
  • candle-binding/src/model_architectures/traditional/base_model.rs
  • candle-binding/src/model_architectures/traditional/bert.rs
  • candle-binding/src/model_architectures/traditional/mod.rs
  • candle-binding/src/model_architectures/traditional/modernbert.rs
  • candle-binding/src/model_architectures/traits.rs
  • candle-binding/src/model_architectures/unified_interface.rs
  • candle-binding/src/utils/memory.rs
  • candle-binding/src/utils/mod.rs
  • candle-binding/src/lib.rs

📁 config

Owners: @rootfs
Files changed:

  • config/config.yaml

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.


Ok(Self {
bert_classifier: classifier,
confidence_threshold: 0.7,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

threshold should be read from config.yaml instead of hardcoded

let intent = if predicted_class < self.intent_labels.len() {
self.intent_labels[predicted_class].clone()
} else {
format!("UNKNOWN_{}", predicted_class)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should just fail if no class found?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should fail when no class is found rather than returning a default value. This ensures the caller knows that classification failed.

let intent = if predicted_class < self.intent_labels.len() {
self.intent_labels[predicted_class].clone()
} else {
format!("UNKNOWN_{}", predicted_class)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will maintain consistent error handling here as well. When classification result cannot be determined, it should return an error.

// Skip "O" (Outside) labels - class 0 typically means no PII
if class_idx > 0 && class_idx < self.pii_types.len() {
has_pii = true;
max_confidence = max_confidence.max(confidence);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for pii, we need to have confidence for each occurrence. The max confidence inflates the significance of others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide independent confidence for each PII test item, not influenced by max value.


// Calculate severity score based on confidence and threat type
let severity_score = if is_threat {
confidence * 0.9 // High severity for detected threats
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just return raw confidence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, should return raw confidence instead of processed severity_score.

})?;

// Determine if threat is detected based on predicted class
let is_threat = predicted_class > 0; // Assuming class 0 is "benign" or "safe"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the class label or class number instead of hardcode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will use class labels or class numbers instead of hardcoded strings.


for word in &words {
// Generate contextual embedding for each word
// This simulates BERT's contextualized embeddings based on word content
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you apply the bert embedding instead of simulation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will use real BERT embeddings instead of simulation data.

}

/// Generate contextual embedding based on word content
fn generate_contextual_embedding(&self, word: &str) -> Result<Tensor> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use the model embedding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will use actual model embeddings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! I am a bit tied up these days, will come back to review soon-ish.

@OneZero-Y OneZero-Y force-pushed the refactor/modular-candle-binding-architecture branch from 3697826 to 76b941f Compare September 25, 2025 09:38

pub fn forward(&self, _input: &Tensor) -> Result<Tensor> {
// Simplified placeholder implementation
Tensor::zeros(&[1, 768], candle_core::DType::F32, &self.device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment?
768 is modernbert base, the large model has 1024 hidden size.

/// Task type
pub task: ClassificationTask,
/// Classification result
pub result: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the probabilities in ClassificationResult missing?

/// This method computes embeddings for each candidate individually,
/// which is suitable for small candidate lists. For large lists,
/// consider batch processing.
pub fn find_most_similar(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a thought for enhancement, maybe this can return top_k

use_reasoning: false

# Router Configuration for Dual-Path Selection
router:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this go to a config file of its own and be loaded by the master config.yaml?

@rootfs
Copy link
Collaborator

rootfs commented Sep 27, 2025

@OneZero-Y this is a quite major refactoring, would it be ok we merge to a dev branch first and we add test cases to validate before merging to main branch?

@OneZero-Y
Copy link
Contributor Author

@OneZero-Y this is a quite major refactoring, would it be ok we merge to a dev branch first and we add test cases to validate before merging to main branch?

@rootfs
Yes, of course

@rootfs
Copy link
Collaborator

rootfs commented Sep 27, 2025

@OneZero-Y great! i created branch feat-candle-refactoring, can you open PR against this one?

https://github.com/vllm-project/semantic-router/tree/feat-candle-refactoring


- Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/)
- Add unified error handling and configuration loading systems
- Implement dual-path architecture for traditional and LoRA models
- Add comprehensive FFI layer with memory safety

Maintains backward compatibility while enabling future model integrations.

Signed-off-by: OneZero-Y <[email protected]>

refactor: Implement modular candle-binding architecture

- Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/)
- Add unified error handling and configuration loading systems
- Implement dual-path architecture for traditional and LoRA models
- Add comprehensive FFI layer with memory safety

Maintains backward compatibility while enabling future model integrations.

Signed-off-by: OneZero-Y <[email protected]>
@OneZero-Y OneZero-Y force-pushed the refactor/modular-candle-binding-architecture branch from 76b941f to 6cd74d5 Compare September 28, 2025 04:53
@OneZero-Y
Copy link
Contributor Author

@rootfs I have created a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants